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THE ADAPTIVE PATCHED CUBATURE FILTER AND ITS 
IMPLEMENTATION 


WONJUNG LEE * AND TERRY LYONS t 

Abstract. There are numerous contexts where one wishes to describe the state of a randomly 
evolving system. Effective solutions combine models that quantify the underlying uncertainty with 
available observational data to form scientifically reasonable estimates for the uncertainty in the 
system state. Stochastic differential equations are often used to mathematically model the underlying 
system. 

The Kusuoka-Lyons-Victoir (KLV) approach is a higher order particle method for approximat¬ 
ing the weak solution of a stochastic differential equation that uses a weighted set of scenarios to 
approximate the evolving probability distribution to a high order of accuracy. The algorithm can be 
performed by integrating along a number of carefully selected bounded variation paths. The iterated 
application of the KLV method has a tendency for the number of particles to increase. This can be 
addressed and, together with local dynamic recombination, which simplifies the support of discrete 
measure without harming the accuracy of the approximation, the KLV method becomes eligible to 
solve the filtering problem in contexts where one desires to maintain an accurate description of the 
ever-evolving conditioned measure. 

In addition to the alternate application of the KLV method and recombination, we make use of 
the smooth nature of the likelihood function and high order accuracy of the approximations to lead 
some of the particles immediately to the next observation time and to build into the algorithm a 
form of automatic high order adaptive importance sampling. 

We perform numerical simulations to evaluate the efficiency and accuracy of the proposed ap¬ 
proaches in the example of the linear stochastic differential equation driven by three dimensional 
Brownian motion. Our numerical simulations show that, even when the sequential Monte-Carlo 
methods poorly perform, the KLV method and recombination can together be used to approximate 
higher order moments of the filtering solution in a moderate dimension with high accuracy and 
efficiency. 
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1. Introduction Filtering is an approach for calculating the probability distri¬ 
bution of an evolving system in the presence of noisy observations. The problem has 
many significant and practical applications in science and engineering, for example 
navigational and guidance systems, radar tracking, sonar ranging, satellite and air¬ 
plane orbit determination, the spread of hazardous plumes or pollutants, prediction 
of weather and climate in atmosphere-ocean dynamics [20l[2Tl[23l[Tl[T6l[Tlll2l[T5]. 
If both the underlying system and the observation process satisfy linear equations, 
the solution of the filtering problem can be obtained from the Kalman filter [2Q1EI]. 
For nonlinear filtering problems in finite dimension, there occasionally exist analytic 
solutions but the results are too narrow in applicability [5]. As a result, a number 
of numerical schemes have been developed toward an aim to accurately describe the 
fundamental object of interest in filtering, i.e., the conditioned measure, in terms of 
collection of weighted Dirac masses [Ulliaill]. 

When the underlying dynamics is a continuous process and the available obser¬ 
vations are intermittent in time, the standard approach of filtering is to perform a 
forward uncertainty quantification and then to incorporate data into this predicted 
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measure using Bayes’ rule in a sequential fashion. The former prediction step cor¬ 
responds to solving the Kolmogorov forward equation when the system is driven by 
Brownian motions. For the numerical integration of a stochastic differential equation, 
the sequential Monte-Carlo method uses sampling from random variables whose dis¬ 
tribution agrees with the law of the truncated strong Taylor expansion of the solution 
of an Ito diffusion. The algorithm usually gives lower order strong convergence of the 
probability distribution [22] . 

Instead of randomly simulating Wiener measure as in the sequential Monte-Carlo 
method, the KLV method at the path level replaces Brownian motion by a weighted 
combination of bounded variation paths while making sure that expectations of the 
iterated integrals with respect to these two measures on Wiener space agree up to 
a certain degree. Then the particles are pushed forward along the deterministically 
chosen paths to yield a weighted discrete measure. The KLV method is of higher 
order with effective and transparent error bounds obtained from the Stratonovich- 
Taylor expansion of the solution of a stochastic differential equation [31] . 

It is intrinsic to the KLV method that the number of particles increases when the 
algorithm is iterated. Therefore its successive application without an efficient suppres¬ 
sion of the growth of the number of particles cannot be used to filter the ever-evolving 
dynamics. Given a family of test functions, one can replace the original discrete mea¬ 
sure by a simpler measure with smaller support whose integrations against these test 
functions agree with those against the original measure. Recombination achieves the 
reduction of particles in this way using the polynomials as test functions [29] . One 
advantage of recombination is its local applicability in space. Therefore one can di¬ 
vide the set of particles into a number of disjoint subsets and recombine each subset 
of discrete measure separately, a process which we call the patched recombination. 
The dynamic property of patched recombination, if an efficient classification method 
is provided, leads to a competitive high order reduction algorithm whose error bound 
can be obtained from the Taylor expansion of the test function. 

One can use the alternate application of the KLV method and patched recom¬ 
bination as an algorithm for the prediction step in filtering. However the cost of 
this non-adaptive method would become extremely high particularly in high dimen¬ 
sion. Therefore we further modify the algorithm so that it can significantly reduce 
the computational efforts. More precisely, we exploit the internal smoothness of the 
likelihood to allow some particles to immediately leap to the next observation time 
provided certain conditions are fulfilled. The bootstrap reweighting is subsequently 
applied to obtain our non Monte-Carlo particle approximation of the optimal filter. 

The paper is organised as follows. Section [2] introduces the filtering problem 
and the Bayesian filter as its formal solution. In section [S] a prototypical sequential 
Monte-Carlo filtering algorithm and one of its clever variants that adapts impor¬ 
tance sampling are described. The rest of the paper is devoted to develop two non 
Monte-Carlo particle filtering algorithms that retain the strengths and mitigate the 
weaknesses of these existing Monte-Carlo methods. In order to do that, two essen¬ 
tial building blocks, cubature measure on (infinite dimensional) Wiener space and 
cubature measure on a finite dimensional space, are introduced in sections [J] and [S] 
respectively. In section [5] we define the main algorthms and in section [3 we per¬ 
form numerical simulations to validate the algorithms. Concluding discussions are in 
section [U 

2. Bayesian filter Suppose that the V-dimensional underlying Markov pro¬ 
cess X{t), tGK+U{0}, and the V'-dimensional observation process Vi, nGN, as- 


Wonjung Lee and Terry Lyons 


3 


sociated with Xn = X{nT) are given, for some inter-observation time T>0. Let 
Yi-n'= ,Yn'} be the path of the observation process and yi:n'= {yi,---,?/«'} 

be a generic point in the space of paths. We define the measure of the conditioned 
variable Xn\Yi.n' by 7r„|„/(da;n) =P(Xjj G dxn\Yi-,n' = yi:n>)- Assuming the law of A(0) 
is given, filtering is to find 7r„|„ for all n> 1. 

This intermittent data assimilation problem can in principle be solved by the alter¬ 
nate application of the prediction, to obtain the prior measure T^n\n-i from 7r„_i|„_i, 
and the updating, to obtain the posterior measure 7r„|„ from 7r„|„_i. If the transition 
kernel K{dxn\xn-i) and the likelihood function y(y„|a;„), satisfying 

P(A„G A|A„_i=a:„_i)= / K{dxn\xn-i), 

JA 

P(LnGl?|A^—^n)— / didJul^n) dyni 
Jb 

for all the Borel cr-algebra, and ), are given, the prediction and 

the updating are achieved by 


— d^{dXn\Xn—l^7Tj^_i^j^_i {dXn—l') , 

9{yn\Xn)T^n\n-l{dXn) 




/ g{yn\Xn)T^n\n-l{dXn) ’ 


( 2 . 1 ) 

( 2 . 2 ) 


respectively. Eq. (1^ is Bayes’ rule and the recursive scheme dll]), (122]) is called a 
Bayesian filter. 

3. Particle filtering 

3.1. Weak approximation The closed form of 7r„|„/ is in general not available. 
In many cases the essential properties of a probability measure we are interested 
can accurately be described by the expectation of test functions. If the class of test 
functions is specified, we can replace the original measure with a simpler measure that 
integrates the test functions correctly and hence still keeps the right properties of the 
original measure. Therefore efforts have been devoted to weakly approximating 7r„|„/ 
by finding an efficient way to compute E(/(A„)|Yi.„/) = f f {Xn)'^n\n'(dxn) accurately 
for a sufficiently large class of / : -A R. We mention that the class of test functions 

is not given in the filtering problem. However their choice is quite critical as it 
affects the notion of an optimal algorithm and controls the detailed description of the 
conditioned measure. 

One of the methodologies for the weak approximation is to employ particles whose 
locations and weights characterise the approximation of the conditioned measure. 
More precisely, a particle filter is a recursive algorithm that produces 


A^n|n' 


(3.1) 


approximating 7r„| 

(TTnln’J) by (tT^F 
is used. 


„/, where 6^ denotes the Dirac mass centred at x. One approximates 
o/) = E^i”' the notation (tt,/) = //(a;)7r(da;) 
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3.2. Sequential Monte-Carlo methods Particle approximation is widely 
used in Monte-Carlo frameworks. We here introduce two representative algorithms, 
the sampling importance resampling (SIR) suggested in [17] and the sequential impor¬ 
tance sampling and resampling (SISR) algorithm [30l |35l [13] . The number of particles 
does not have to be equal in each step, but it is here fixed by M„|„/ = M for simplicity 

m- 


3.2.1. Sampling importance resampling (SIR) The prediction step is 
achieved by using (T^n\n-iTf) = {'^n-i\n-iiKf) from Eq. (12.11) . Given the empirical 
measure ^ approximating 7r„_i|„_i, one performs indepen¬ 
dent and identically distributed (i.i.d.) sampling drawn from K{dxn\x\_^^_i). 

Then | ^ is an empirical measure with respect to 7r„|„_i. 

For the updating step, Eq. implies (7r„|„,/) = (7r„|„_i,/g«”)/(7r„|„_i,5?^") 

where the notation = is used. Define the bootstrap reweighting operator 


REW 





(3.2) 


then 7r®|)^ = REWis an approximation of 7r„|„. 

In order to prevent degeneracy in the weights, caused by a successive application 
of Eq. (13.21) . one approximates the weighted discrete measure by an equally 
weighted discrete measure |12| . Random resampling performs M independent samples 
from it®®. This process might introduce a large Monte-Carlo variation 
and work has been done to reduce the variance H Hj. The resulting one tt®® = 
I is an empirical measure with respect to 7r„|„. 

The SIR algorithm can be displayed by 


-SIR 


I—>■ tt; 


SIR 
iln-1 ' 


-SIR . -SIR 
’^n\n 


(3.3) 


where the notation i—>■ is used for moving particles forward in time, => for reweighting 
and —>■ for random resampling. The algorithm is very intuitive and straightforward to 
implement. Further, it produces an approximation that converges toward to the truth 
posterior measure as the number of particles increases However, SIR might be 
inaccurate when 7r®®_j^ is far from 7r„|„ in the sense that bootstrap reweighting gen¬ 
erates importance weights distributed with a high variance. The following algorithm 
modifies SIR to get over this degeneracy problem to some extent. 

3.2.2. Sequential importance sampling and resampling (SISR) Given 
the unweighted measure ^ approximates 7r„_i|„_i, 

one performs i.i.d. sampling K{dxn\xl^_i\n-i:yn) instead of 

K(dxn\x'!^_^^_^). Here the new transition kernel K can depend on the instance 
Dn and should be chosen in a way that the distribution of 7r®|® ^ ^ is 

closer to 7r„|„ than 7r®|))_j^ in the above-mentioned sense |13] . 

Note that 7r®|®();j^ is not distributed according to 7r„|„_i. To account for the effect 
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of this discrepancy, the expression 


— 1 € dx^i—X , € dXji — yi'.n) 

_ w{Xn-l,Xn,yn)K{dXn\Xn-l,yn)'n-„-l\n-l{dXn-l) 

f w(Xn-i,x„,y„)K(dx„lx„_i,y„)Tr^_iij,_i(dXn-i) 


(3.4) 


where 


w(x„_l,Xn,yn)0C 


g(ynlXn)dC(dXnlXn-l) 
K(^dXn l^n —1; yn) 


is used. Replacing K(dxn\xn-i,yn)T^n-i\n-i{,dxn-i) in Eq. (13.411 by its empirical 
approximation and integrating over Xn-i, one obtains = ^ ^ where 

A random resampling from yields the empirical 

measure with respect to TTn\ni denoted by 

If K{dxn\xn-i,yn) and w{xn-i,Xn,yn) have better theoretical properties than 
K{dxn\xn-i) and g{yn\xn) such as better mixing properties of K{dXn\Xn-i,yn) or 
flatter likelihood, then the algorithm can produce a better approximation. Because 
one ne^s to integrate an evolution equation of a Markov process with transition 
kernel K in any practical implementation, designing efficient particle filtering methods 
is equivalent to building an appropriate dynamic model that has good theoretical 
properties while keeping the same filtering distributions. The SISR algorithm 


SISR 


I—>■ TT 


SISR 

n|n—1 


-SISR 

n|n 


—>• TT 


SISR 

n\n 


(3.5) 


might use fewer particles than SIR to achieve a similar accuracy [40] . One can find a 
considerable study illustrating the difference in performance of SISR using different 
proposal distributions in m- 

4. Kusuoka-Lyons-Victoir (KLV) method Suppose that a random vector 
X (t) G evolves according to a Stratonovich stochastic differential equation (SDE) 


d 

dX{t) = Vo{X{t))dt + '^V^{X{t))odWi{t) (4.1) 

2=1 


where {Vt G C^(]R^,K^)}(Lq is a family of smooth vector fields from to with 
bounded derivatives of all orders, and W = ,Wd) denote a set of Brownian 

motions, independent of one another. The KLV method enables to deterministically 
approximate the law of X{T) in terms of discrete measure. 

4.1. Cubature on Wiener space on path level Let us use the notations 
Wo{t) = t, u}T,o{t)=t and I = {ii,-■■ ,ii) G {O,--■ ,dy. Consider the iterated integral 
with respect to W = {Wi,--- ,Wd), 

jlTioW)= [ odWM---odW„iti), 

and the iterated integral with respect to a continuous path of bounded variation 

lot = j^T,d) ■ [0,r] — 

= f 

do<ti<---<ti<T 


dtOTdi (ti)---diOTdi {ti ). 




6 


The adaptive patched cubature filter and its implementation 


Recall that Wiener space Cq is the set of continuous functions starting at 

zero. We define a discrete measure Qt — supported on continuous paths 

of bounded variation to be a cubature on Wiener space on path level of degree m with 
respect to the Wiener measure P provided the equation 

Ep =Eq™ {JIt{°W)) 

Tim 

= Ajv7o,t(^t) (4-2) 

j=i 

holds for all I satisfying 11/|| = Z + card{j,ij = 0} < m. Note Q™ is obtained from Q™ 
via a suitable rescaling and the existence of Q™ with rim < card{/: ||/|| < to} is proved 
in [5T] . 

The cubature measure on Wiener space can be used to approximate Ep(/(X|.)) 
for the random process Xf in N dimension satisfying 

d 

dX^ = Vo {X^)dt + ^ R, (Xf) o dW, (t) (4.3) 

and Xq =x. The expectation of /(Xf) against Wiener measure can be viewed as an 
integral with respect to infinite dimensional Wiener space. 

Let for tG [0,A] be the deterministic process satisfying 

(4.4) 

2^0 

X 

and Xq ' ^ = x. The ordinary differential equations (ODEs) of Eq. (14.41) are obtained 
from replacing the Brownian motions W in Eq. (14.31) by the bounded variation path 
The measure is called the cubature approximation of the 

law of Xy at the path level. 

An error estimate for the weak approximation of this particle method can be 
derived from the Stratonovich-Taylor expansion of a smooth function /, 

/(X^)= ^ Vjfix)JlrioW) + Rm{x,TJ) (4.5) 

||/||<m 

where the remainder Rm{x,T,f) satisfies 

m+2 

sup v'Ep(i?™(x,r,/)2)<C y r/2 sup ||E//||oo (4.6) 

Pll=i 

for a constant C depending on d and to [22]. Here the vector field Vi = (Ei,i,--- ,4^,iv) 
is used as the differential operator Vi = Y^-^^Vijdxj and Vi denotes Vi^-'-Vi,^. 

The process Rm{x,T,f) further satisfies 


m+2 

sup EQm(|i?™(a:,r,/)|)<C' y sup ||E//||oo 

i=m+l 11^11=* 


(4.7) 
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for a constant C depending on d, m and Q™ [SI]- Let the operators Pt and Q™ be 
defined by PTf{x)=E,p{f{X^)) and QJpf{x) = EQm{f{X^)). Then the error bound 
of the cubature approximation at the path level is given by 


sup 


' ^ TTl 

Ep(/(X^))-£A,/(Xr") 

i=i 


= 11 {PT-Q^)f Woo 

m+2 

<c r/2 sup WVifWoo 

i=m+l 


(4.8) 


for smooth /, from Eq. (1121) and Eqs. (H31) . (gH), (HTl) . 

The algorithm was developed by Lyons, Victoir |31j . following the work of 
Kusuoka mills], so it is referred to as the KLV method. Eq. (14.81) leads to define 

C n \ n Um 

(4.9) 

i=l / i=li=l ^ 

that may be interpreted as a Markov operator acting on discrete measure on K'^. 

In the following, assume T G (0,1) for simplicity. One may take a higher degree m 
to achieve a given degree of accuracy in Eq. (1121) . An alternative method to improve 
the accuracy of the particle approximation is a successive application of the KLV 
operator. Let (D = {0 = to< < •• • < ife = L"} be a partition of [0,r] with sj =tj—tj-i. 

Instead of f{x) = {K\X^'^'^ {6x,T),f), the value of Pt f {x) = Pa-i^Ps 2 ■ ■ ■ Psk f {P) can 
accurately be approximated by a multiple step algorithm Q^Q^---Q^f(x). 

Given a discrete measure we define a sequence of discrete measure by 

■ / ^ -1 (4-10) 

$™’y/) = KLV(™)($™’^-'(/),s,) l<j<k 

that can be viewed as Markov chain. The inequality 


PT/(x)-($™’'=(d,),/) 


k 

f=l 


E {^7'~\Sx),{Pa,-QT,)PT-tj) 

k 


<T.\\(P^^-QT,)PT-tjW 

j=l 


(4.11) 


obtained from the Markovian property of the KLV operator shows that the total error 
of the repeated KLV application is bounded above by the sum of the errors over the 
subintervals in the partition. Applying Eq. (14.81) to estimate the upper bound of 
Eq. (14.111) . we need Px-t^f to be smooth. When / is smooth, this is true and the 
error bound 


sup 


m+2 k 

PTf{x)-{<^^’'^{Sx)J) 

i—rn-\-lj—l 


^^E E 


i/2 


sup II K/Pr-q/lloo 
p||=i 
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is obtained from Eqs. (14.8p . (14.111) . 

The case of Lipschitz continuous / is of particular interest because Ptf is indeed 
smooth in the direction of with additional conditions for these vector fields 

[B]. In this case, the regularity estimate 

llv/lloo (4.12) 

holds for all tG(0,l], where C is a constant independent of / [571 US]. Combining 
Eqs. (|4.8I) . (14.111) and Eq. (14.121) . we obtain an error estimate for the KLV method in 
terms of the gradient of the Lipschitz continuous /, 


sup 

xeR-''' 




T/2 


m+2 k — 1 

E E 




i—7n-\-l j 




(4.13) 


where C is a constant independent of k. Here the final term in the upper bound of 
Eq. dHU is estimated by || {Ps,-Q7Jf l|oo<|| PsJ-f ||oo + || f-Q7J\\oo< Cs]!^ || 
V/ ||oo using the boundedness of 

Let = be the Kusuoka partition [53] given by 




(4.14) 


then the error estimate 


sup 

xGR" 


PT/(a:)-($™^^)(4),/) <C||V/||ooTi/2fc-(™-i)/2 (4.15) 


is satisfied for a Lipschitz continuous / when 7>?7i — 1. 

Eq. (14.151) is obtained from substituting the non-equidistant time discretisation 
(D( 7 ) into Eq. (14.131) . Using this particular choice of partition ensures that the bound 
of the KLV error is of high order in the number of iterations k. 

Before concluding this subsection, we here mention that u(x,t) =Ep(/(Vy_()) 
satisfies the partial differential equation (PDE) 


u{x,T) = f{x). 


(4.16) 


where {U}(Lo differential operators |3T]. Therefore Pxfix), the heat ker¬ 

nel applied to /, is equal to the solution u(x,0) of Eq. (I4.16p . Due to this inherent 
relationship between parabolic PDEs and SDEs, one can apply any well-known al¬ 
gorithm for the solution of Eq. (I4.16P to the prediction step of the filtering problem 
where the underlying system is given by Eq. 631. However it is very important to 
understand the critical difference between these two problems. One needs to weakly 
approximate the law of X{T), when V(0) is given by Sx, that accurately integrate the 
test function / for the PDE problem while the filtering problem requires one to ap¬ 
proximate the conditioned measure of for all n> 1, in which the test function 

is not at all specified. 
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4.2. Cubature on Wiener space on flow level We here study the construc¬ 
tion of cubature formula . Meanwhile cubature on Wiener space on flow level is 
defined in terms of Lie polynomial and used to develop an approximation based on 
the autonomous ODEs at flow level. 

Let {ei}f^Q be the standard basis of Let T denote the associative and non- 

commutative tensor algebra of polynomials generated by {ei}f_Q. The exponential and 
logarithm on T are defined by 

exp(a) = ^—, 

(4.17) 

log(a) = log(ao) + ^-^-(-1) , 


where a = and e/ = (8)• • -(8)6^, for a multi-index ,b) G {0, -• • ,dy. 

Here 0 denotes the tensor product. Let the operators exp*^'"^(-) and log*-™^(-) be 
defined by the truncation of Eq. (14.1711 leaving the case || /1|< m. 

The signature of a continuous path of bounded variation ujt : [0,T] —by 


5, 


CXD „ 

i , t ( wt ) = ^^ / dujT{ti)®---®duJT{ti) 

= y^Jo.T{^T)e-i 


and similarly the signature of a Brownian motion W by 

5o,T(oW) = ^y(_r(oW)e/. 

/ 

In view of Eq. 621), the definition of cubature on Wiener space of degree m can be 
rephrased by 

Ep (oW)) = Eqp (oW)) (4.18) 

where {■) is the truncation of 5 o,t(-) to the case || I\\<m. 

Define C to be the space of Lie polynomials, i.e., linear combinations of finite 
sequences of Lie brackets [ei,ej]=ei'Siej — ej^ei. Because Chen’s theorem ensures 
that the logarithm of signature is a Lie series |37] . its truncation 

Cy = log^^\So.T{t^y)) (4.19) 

is a Lie polynomial and an element of C. Then the measure Qt—12^=1 sup¬ 

ported on Lie polynomials satisfies 

Ep(4J(oIT))=Eq. (expW(£)) 

rim 

= ^Ajexp(™)(/:^j,). (4.20) 

i=i 


Conversely, for any Lie polynomials C^, there exists continuous bounded variation 
paths oj^ whose truncated logarithmic signature is Ctp. Moreover if Q™ satisfies 
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Eq. (I4.20L then Q™ satisfies Eq. (j4.18l) . Therefore Eq. (I4.18P and Eq. (14.201) are 
equivalent. The discrete measure Q™ is defined as cubature on Wiener space on flow 
level. 

The expectation of the truncated Brownian signature is 


Ep(5^™^(oW))=exp('") 



(4.21) 


which is proved in |31j . It is immediate from Eq. (14.211) that cubature formulae 
on Wiener space for TO = 2n—1 and m = 2n are equal to one another. A formula 
satisfying Eq. (14.201) is found when m = 3 and m = 5 for any d [3T]. In 
some cases of to> 7, cubature formula of Lie polynomial is available when d=l,2 (See 

m)- 

From this Q™ and Eq. (14.191) . one can construct Q™ (See [3TJ [18]). It fol¬ 
lows from the scaling property of the Brownian motion that ^it) = uji Q(t) and 
-(t) = -\/T'a;( j(t/T) for l<i<d. The paths define a cubature formula Qjp. Us¬ 
ing Eq. (|4.19p . the scaling of the Lie polynomial is 

£y = (T,£j) where (fj^jO/e/) = The Lie polynomials define a cuba¬ 

ture formula Q™. 

We next study the approximation based on the flows of autonomous ODEs. It 
is in fact corresponds to a version of Kusuoka’s algorithm [24]. Let T denote the 
algebra homomorphism generated by T(ei) = Vi for i = 0,---,d. For a vector field 
V £ (^“(K^jR^), we define the flow Exp(fU) (cc) = ^f to be the solution of the ODE 
d^f = V{^f)dt with =x. By interchanging the algebra homomorphism T with the 
exponentiation (so far taken in the tensor algebra) we arrive at an approximation 
operator in which the exponentiation is understood as taking the flow of autonomous 
ODEs. More precisely, one has 


/1 

Ep (r (o W)) ) /(x) = £ A,r (exp w (C^)) fix) 

i=i 

rim 

-^•^j7(Exp(r(£^)) (x)) 
i=i 

using Eq. (14.201) . The error introduced while interchanging exp and T operators turns 
out to be of the similar order with the error in the cubature approximation of the 
path level as shown below. 

Formally the cubature approximation at the flow level is defined as follows. Let 
X cj 

^ for t G [0,1] be the deterministic process satisfying 

=r{C^^)iXt’^'^)dt (4.22) 

and Xq'^^ = X. Define the operator 

N / » \ n Um 

KLV 

\i=i / i=ij=i 1 


(4.23) 
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and a sequence of discrete measure 



for l<j<k. 

~ —s_^(rn) 

Let (5™/(a;) = (KLV {6x,T),f ) be a flow level cubature approximation, then 


the Taylor expansions of Eq. (14.41) and Eq. (14.221) lead to 


\\m-Q^)f\\^<C Y. (4.24) 


m+l<||/||<2m 


for a smooth /, where C is a constant depending on m, d, Q™ and Q™ [Mj. 
The error estimate 


sup PT/(a:)-(4>™;^(4),/) <C||V/|Uri/2fc-(—1)/2 (4.25) 


is satisfied for a Lipschitz continuous / when 7 >m—1. 

Eq. (14.251) is obtained using Eq. (14.241) and demonstrates that for a suitable par¬ 
tition the bounds for the approximation at flow and path level have the same rate of 
convergence in view of Eq. (14.151) . 

5. Simplification of particle approximation A successive application of 
the KLV operator gives rise to geometric growth of the number of particles in view of 
Eqs. (14.91) and (14.231) . Except some cases of PDE problems in which the KLV method 
can produce an accurate approximation with small number of iterations, this geomet¬ 
ric growth of particle number prohibits an application of the KLV method particularly 
to the filtering problem where to maintain an accurate description of the ever-evolving 
measure with reasonable computational cost is the key requirement. It is therefore 
necessary to add a simplification algorithm between two consecutive iterations, which 
suppresses the growth of the number of particles in the KLV framework. Though it 
is possible to achieve the simplification through one of several Monte-Carlo methods, 
we here make use of cubature measure on a finite dimensional space to efficiently 
reduce the support of discrete measure. This will let the entire algorithm consistently 
step outside of the Monte-Carlo paradigm. Furthermore, its proper applications never 
harm the accuracy of the KLV approximation as we shall see. 

5.1. Cubature on a finite dimensional space Let i/ he a (possibly 
unnormalised) positive measure on A discrete measure ='^^hiWj6yj is 

called a cubature (quadrature when A^ = l) of degree r with respect to v provided 
supp(9^’’^) C supp(z^) and (iz,^) equals for all polynomials q 

whose total degree is less than or equal to r. It is proved that a cubature with 
respect to an arbitrary positive measure v satisfying rir < exists [35] . As a 

result, one can adopt a cubature measure on R.^ with respect to the original measure 
as the reduced measure. 

Importantly, an error bound of {v,F) — = {y — v^F^f) for a smooth func¬ 

tion F : R^ —>■ R can be obtained from the Taylor expansion. The value of F at 
x= (cci,--- ,xjv) is written as 



(5.1) 
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where a= (ai,---,ajv); |Qf| = aiH-ha^v, = D°‘= ■ ■ ■ dx'^, x°‘ = 


and 



(5.2) 


for some x* SR^. If the support of is in a closed ball of centre xo and radius u, 
denoted by B{xo,u), then we have 


|(i.-?W,F)| = < 2(:.,1) II |U„(B(.o.^)) 



Eq. (15.31) reveals that cubature on a finite dimensional space is an approach for the 
numerical integration of functions on finite dimensional space with a clear error bound. 

5.2. Local dynamic recombination Instead of using a cubature of higher 
degree to reduce the entire family of particles all at once, we improve the performance 
by dividing a given discrete measure into locally supported unnormalised positive 
measures and replacing each separated measure by the cubature of lower degree [55] . 
This so-called local dynamic recombination can be a competitive algorithm because 
each reduction can be performed in a parallel manner to save computational time and 
the error bound from the Taylor approximation remains of higher order. 

Let U = be a collection of balls of radius u that covers the support of 

discrete measure /r on then one can find unnormalised measures such that 

= U^li fj,i {fj,i and /ij have disjoint support for i^j) and supp(/ii) C [/jnsupp(/x). 
In this case, we define the patched recombination operator by 


R 



(5.4) 


^(r) 

where p.) ' denotes a cubature of degree r with respect to 

Given a discrete measure pP, we define a sequence of discrete measure by 



(5.5) 


for l<j<k. An application of Eq. (1531) with initial condition 5x yields a weak 
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approximation for the law of X^. One obtains the estimate 






T — ti- 


f)-{^V:kr)iS.),PT-tj) 




K 

n 

i=i 

i=i 

k 

<Y.\\iPs,-QT,)PT-tJ\\o. 


i=i 


k-1 


+ 


E I - Kkr)i5^lPT-tj) \ (5.6) 


1=0 

where the first sum of the upper bound is due to the KLV approximation. The second 
sum is the error caused by the recombination. 

Suppose that / is Lipschitz continuous. The smoothness of Ptf leads to 


sup 


'^Ziur)iS.)-<^>viur)i^^)’PT-tj) <Cu^/^^ SUp \\ Pr-t, f \\oo (5.7) 

Z \a\=rj + l 


for 0 < j < fc — 1, where Eq. (lOD and the triangle inequality are used. Like the case 
of Eq. (14.121) . a suitable condition on {Vi}f^Q ensures there exists a positive integer 
p G N such that 

sup WD'^PtfWoo^Ct-^^/^WVfW^ (5.8) 

|Q:|=r+l 

for all t£ (0,1]. When Eqs. (14.121) and (15.81) are satisfied [521 USE], one obtains 


sup 

KeR''' 


P 


>T/(x)-($-f„„)(4),/) 


m+2 k — 1 




k-1 


.r-i+l 


< E E 


+C2E- 


i—m+l j 




|V/||oo (5.9) 


from Eqs. (14.13^ . (15.71) . Here Ci and C 2 are constants. 

The recombination error can be controlled by the radius of the ball Uj and the 
cubature on degree rj. By choosing an appropriate pair {uj,rj), one can make 
the order of the recombination error bound not dominant over the order of the error 
bound in the KLV method. For example, in the case of (uj,rj) = |"m/p]) where 

a = (p—l)/(2(|'m/p] +1)) ([a:] denotes the smallest integer greater than or equal to 
x) or = /{T the error estimate 


sup 

xeR.''' 


P 


y(x) - ($™^),(„,,)(4),/) I < c IIV/ lu 1)/2 (5.10) 
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is satisfied for a Lipschitz continuous / when 7 >m—1. Eq. (jb.lOp is obtained from 
substituting the partition defined in Eq. (I4.14p into Eq. (j5.9l) and shows the same 
convergence rate with the ones without recombination, Eqs. (14.151) and (14.251) . 

6. Patched cubature filter and Adaptive patched cubature filter Recall 
X(t) G is governed by 


d 

dX{t) = Vo{X{t))dt + '^V^{X{t))odWi{t). (6.1) 

Let the noisy observations Yn associated with Xn =X{nT) satisfy 

Yn = <f{Xn)+rin, r]n^Af{0,Rn) (6.2) 

where ip G ) and realisations of the noise ijn are i.i.d. random vectors in 

For a deterministic particle approximation of the optimal filtering solution of 
Eqs. ()6.1I) and (lOD . we employ the KLV method and recombination to define the 
patched cubature filter (PCF) in subsection l6.1l and and the adaptive patched cubature 
filter (APCF) in subsection 16.21 We address several issues encountered during their 
practical implementations in subsection 16.31 

6.1. Patched cubature filter (PCF) Let TTn\n' be the law of the conditioned 
variable and be a discrete measure approximation of the law of A(0). 

We define the patched cubature filter (PCF) at the path level by the recursive algorithm 


PCF /'.,rPCF \ 

^n|n—1 I’T'—l’ 

7rP7=REw(^P|7i,5""), 


(6.3) 


for n> 1. The algorithm can be stated as the following. 

1. One breaks the measure into patches and performs individual recombination 
for each one. 

2. One moves given discrete measure forward in time using the KLV method. 

3. One performs data assimilation via bootstrap reweighting at every inter¬ 
observation time which might differ from the time step for the numerical 
integration. 

4. One again applies the patched recombination. 

Using in place of 6x in Eq. (15.6p . an error bound of the prior approxi¬ 

mation of the PCF is given by 


(7r„|„_i-7rP|7l,/) < {T^n-l\n-l,PTf)-{T^n-l\n-l^PTf) 


PCF 


-k 




PCF 


< 


{^n-l\n-l-<%n-l,PTf) +Y.\\ {Ps, - Q^)PT-tJ\\c 




k-l 


+ V I (®Si..-) 1) - KC) /) 


1=0 


(6.4) 
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One can use the same argument with the case of PDE problem to obtain a higher 
order estimate of the PCF. An error bound of the posterior approximation 


PCF 

‘n|n 


,/) 




< 


tl _ 

{T^n\n-l,gy^ 
1 


, ,gy^) 

^ n|n—1’^ ' 


ll/llo 


{,'^n\n — l^g^ 


(7r„|„-i-7rP|°^i,g2'") 


(6.5) 


is given in terms of an error estimate of the prior approximation. We stress that PCF 
does not include any Monte-Carlo subroutine and therefore its error estimate for the 
weak approximation can be of high order with respect to the number of iterations k 
in view of Fqs. (16.41) and (16.51) . 

Recall the path level operator KLV*-"*^ can be replaced by the flow level operator 

—(m) 

KLV without harming the order of accuracy. By doing this in the PCF at the path 
level, we define the PCF at the flow level by the successive algorithm that produces 


fPCF 
n|n—1 


and 


;PCF 
2|r 


6.2. Adaptive patched cubature filter (APCF) 

It would be worthwhile to mention that PCF (|6.3I) . like SIR (13.31) . performs the 
prior approximation without using the next time observational data. This naturally 
leads us to develop a variant of PCF that will share the common essential feature with 
SISR (j3.5p in the aspect that the observation process is involved in moving particles 
forward in time. 

In order to do that, we first consider a modification of the standard KLV scheme in 
which some particles are adaptively accelerated when it causes no significant difference 
in the integration of the test function. If the smoothness of the test function is 
not known in advance, the accuracy requirement of the KLV numerical approach 
leaves no choice other than to let the family of particles forward following the pre¬ 
specified partition until the next observation time. This is because, for truly irregular 
test functions, accurate integration would require exploration of the irregularities. 
However if the test function is smooth enough and the less regular set is of significantly 
lower dimension than the main part of the smoothness, then we are allowed to let the 
particles to go straight to the next observation time from some considerable distance 
back instead of the step predicted in the worst case which we would otherwise have 
used to terminate the algorithm. 

We build this insight into the practical algorithm. At each application of the 
KLV operator, the algorithm evaluates the test function using a one step prediction 
straight to the next observation time and compares this with the evaluation using a 
two step (one next step and the rest step to the next observation time) prediction. 
If two evaluations agree within the error tolerance, then the particles immediately 
leap to the next observation time. Otherwise the prediction will follow the original 
partition. 

In terms of accuracy, the approach is pragmatically rather successful because the 
opportunities for two (or three to break certain pathological symmetries) step predic¬ 
tion to produce consistent answers by chance is essentially negligible. Furthermore, 
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the adaptive switch for which the KLV is employed to move the prediction measure 
forward but move a part of it straight to the observation time whenever the relevant 
part of the test function is smooth enough has a very significant effect of pruning the 
computation and speeding up the algorithm due to the reduction of particles to be 
recombined at each iteration. 

This adaptive KLV method of course cannot be applied without a test function. 
Differently from the PDE problem, the test function is not specified in the filtering 
problem. Therefore in practice we take the smooth likelihood as test function to lead 
the adaptation. 

Recall V = = <tk=T} is a partition of [0,T] with sj =tj —tj-i. 

We use the likelihood to define the splitting operator acting on a discrete 
measure time tj-i. Let =KLY(Sxi,tj —tj-i), 

KLV(^i, 2 i,tfc —tj) and = K\N— Let Ir be the collection of index i 

satisfying 22:5^")! Then the discrete measure is the union of two 

discrete measure where simplic¬ 
ity, is defined to be the null set. The process defines the splitting operator 


splM 

for 1 < j < fc. 

Define a sequence of discrete measures as follows 

^m,0 / 0\ 0 

^T> 




=REc<v-. V-.) 

i-®:?.:;;?!/)= spl''’ (SxSfxi/i.s'-). 


$ 


T) 
m,j-l 




for Let (p") 






,(„X(/) = KLV(-)(kLV(-) ,T-t, 


( 6 . 6 ) 


(6.7) 


( 6 . 8 ) 


for 1 < J < fc — 1. 

We define the adaptive patched cubature filter (APCF) at the path level by 


( fc-i 

I I xm.j-l.fc / APCF \ 1 I L.^xAPCF t, 

I_I ^ |n-l 1 ’ 

j=l 

APCF_T}m’(AL/"xxAPCF v„\ 

^n\n ” \^^n|n —Itff 


(6.9) 


for n> 1. The algorithm can be stated as the following. 

1. One breaks the measure into patches and performs individual recombination 
for each one. 

2. One splits given discrete measure to lead some of the particles to the next 
observation time and the rest particles to the next iteration time using the 
KLV method. 

3. One performs data assimilation via bootstrap reweighting at every inter¬ 
observation time which might differ from the time step for the numerical 
integration. 
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4. One again applies the patched recombination. 

/ X '(m) 

Via replacing by KLV , we define the APCF at the flow level that 

produces and ^t^pcf gf ttApcf .^apcf^ 

^ n|n —1 n|n n|n—1 n|n 

In view of Eq. (ra . the likelihood is indeed a natural choice for the filtering 
problem in which the posterior measure is of primary interest. One can apply 
and simultaneously as the test function for the SPL operator in Eq. (EH) if one 
would like to obtain a posterior approximation that accurately integrates /. 

Note both SISR (13. 5|) and APCF (16.91) are built upon the same philosophy - mak¬ 
ing use of the observational information to lead the particles for a more accurate 
approximation of the posterior possibly at the expense of the accuracy of the corre¬ 
sponding prior approximation. However the way of modifying the basis algorithm is 
different from one another. In particular, while SISR leads the particles only using 
the instance of the observation yn, APCF fully uses the likelihood to achieve the 
adaptation. Furthermore, APCF cares the domain of importance without introducing 
a new dynamics. 

6.3. Practical implementation One has to specify the time partition and 
the way of patched recombination for the implementation of PCF and APCF. We here 
present adaptive partition and adaptive recombination as alternatives to the Kusuoka 
partition and the covering with fixed-size balls, respectively. Differently from prior 
suggestions, our ones are subject to the test function and thus called adaptive. 

Before doing that, we at this point mention that the work in [5] also employs cu- 
bature on Wiener space to solve the nonlinear filtering problem. Comparing these two 
kinds of cubature filters, one major difference is how to simplify the support of discrete 
measure between successive KLV applications to control computational cost. The one 
developed in [5] makes use of the Monte-Carlo scheme based on branching and prun¬ 
ing mechanism. The algorithm looks for a reduced measure whose distance from the 
original measure is minimised in some sense. Therefore the simplification procedure 
should be applied to the whole discrete measure all at once. On the contrary, PCF 
and APCF take the deterministic moment-matching recombination strategy, that can 
be applied locally in the support of measure for an enhanced efficiency. 

In addition to the algorithm characteristics, the problem setting in [8] is rather 
different from the current paper as the observation process is assumed to be not 
discrete but continuous (for more details we refer the reader to [28]). In this case, the 
time integration of the KLV method is performed along with even partition of small 
intervals. However, in case of sparse observations, the numerical integration until 
the next observation time requires multiple steps preferably with uneven partition of 
decreasing intervals rather than even partition. For PCF and APCF, the likelihood 
can serve as the test function and we can further utilise the presence of this test 
function to determine time partition. This is clearly one additional degree of freedom 
allowed in the cubature filter under the scenario of intermittent observations. 

6.3.1. Adaptive partition 

For a given test function /, one can make use of the heat kernel Pt as well as / to 
evolve the set of particles so that one step error is within a given degree of accuracy, 
i.e.. 


\\{Ps,-QZ)PT-tJ\\oo<e (6.10) 

for some e>0. We define an adaptive partition 'P{e,f) = {tj}^^Q to be a time dis¬ 
cretisation for which each Sj = tj — tj-i is the supremum among the ones satisfying 
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Eq. (I6.10|) . Because Ptf becomes smoother as t increases, the sequence tends 

to decrease monotonically, i.e., Si > S 2 > • • • > Sfe. The upper bound of the total error 
along with the adaptive partition is given by 


sup 




< ke 


( 6 . 11 ) 


from Eq. (|4.1ip . 

6.3.2. Adaptive recombination 

Consider the condition 




<e 


( 6 . 12 ) 


given some 0>0. We define the adaptive recombination by the algorithm that uses 
as large value of u as possible, for a fixed recombination degree r, among the ones 
satisfying Eq. (16.121) . The algorithm again makes use of the heat kernel Pt as well as 
the test function /. When the adaptive partition and the adaptive recombination are 
simultaneously used, the combination of Eqs. (16.101) and (16.121) yields 


sup 


Pt/(x)-($™^(4),/) <kie + 0) 


(6.13) 


from Eq. (lOl) . Notice, unlike the case of Eq. (15.101) where the constant C is not 
specified, the upper bound of Eq. (16.131) is explicitly under the control. 

It deserves to mention that the application of the adaptive recombination does 
not require to determine the size (and even topology) of patches in advance. Given the 
recombination degree, it suffices to keep shrinking the size of patches until Eq. (I6.12p 
is met. Due to this feature, the adaptive recombination can practically be useful in 
achieving the error bound of Eq. (I6.13|) when it is accompanied with an efficient algo¬ 
rithm that divides the support of discrete measure into local disjoint subsets. Because 
the detailed algorithm of the recombination can be found in [29] , we conclude this sec¬ 
tion with one way to achieve the adaptive recombination utilising the Morton ordering 
[22] ■ The methodology adopts boxes, instead of balls, as patches to locally cover the 
particles. The algorithm is advantageous particularly in case of high dimension. 

Given a number of particles in N dimension, we perform an affine transformation 
to map the particles into the ones in the box [0.5,1)^. In the following, we evenly 
divide each edge of the box by 2" to get 2"^ sub-boxes and assign the particles 
to these sub-boxes. We use the double-precision floating-point format in scientific 
computing: any number z* G [0.5,1) is saved in terms of where 6® is either 0 

or 1 in a way that z* = (l/2) x + (almost all numbers in [0.5,1) have a 

binary expansion of more than 52 digits but this reduced information is quite enough 
for our purpose). In this way the point ,z^) in A-dimension can be expressed 

by 52 X A binary numbers. Interleaving the binary coordinate values yields binary 
values. Gonnecting the binary values in their numerical order produces the Morton 
ordering. Then an appropriate coarse-graining leads to the subdivision of a box. For 
examples, when A = 2, the binary value corresponding (z^,z^) is ‘’^ 52 ^ 52 - 

The point is in first quadrant if (feji^^) = (1,1), in second quadrant if (&[,&f) = (0,1), 
in third quadrant if {bl,bf) = (0,0) and in fourth quadrant if (5i,&i) = (1,0). Applying 
this classification to a number of particles produces 2^ disjoint subsets of classified 
particles. Similarly, using blbfblb^ and ignoring the rest subgrid scales gives 4^ subsets 
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when N = 2. Taking the inverse affine transformation, a classification of the particles 
has been achieved. 

The crucial point being that by sorting the one dimensional transformed points, 
one keeps points in a box together without ever needing to introduce the boxes and 
particularly empty boxes. The complexity of the clustering is no worse than MNlogM 
in the number of points. Here M is number of particles, N is dimension and MNlogM 
is the cost of patching. Note NlogM is the cost of getting those points in a patch. 

7. Numerical simulations We perform numerical simulations to examine 
the efficiency and accuracy of the proposed filtering approaches. We introduce the 
test model in subsection 17.11 and obtain the reference solutions in subsection O We 
implement the PCF and APCF with cubature on Wiener space of degree m = 5 in 
subsection [731 Finally, in subsection 17.41 we investigate the prospective performance 
of PCF and APCF with cubature on Wiener space of degree m = 7. 

7.1. Test model It is very important to select a good example to examine the 
performance of the algorithms we have developed. Here we choose a forward model 
and observation process for which the analytic solution of the filtering problem is 
known and can be used to measure the accuracy of the various particle approximations. 

Our test model is the Ornstein-Uhlenbeck process [39] in three dimension: 

dX = -AXdt + gl 3 dW (7.1) 

/ (7 —a 0 \ 

where X = {x^x‘^x^Y, A= [ —p 1 0 1 , dW={dWidW 2 dW 3 Y and I 3 denotes the 

VO 0 

3x3 identity matrix. Here the superscript t denotes the transpose. The parameter 
values cr= 1, p = 0.28, 13 = 8/3 and 5 = 0.5 are chosen. The observations 

Yn = Xn+r]n, Pn N {3 , Rn) (7.2) 

are available at every inter-observation time T = 0.5. We study the cases in which the 
covariance of observation noise is = Rx I 3 for the values R = 10“^, 10“^ and 10“^. 

7.2. References 

7.2.1. Kalman filter The conditioned measure for Eqs. dTTD . (1731) is Gaussian 
and 7r„|„/=A/'(M„|„/,C„|„/) can be obtained from the Kalman filter. In this case, 
the prior covariance Cn\n-i satisfies the Riccati difference equation and its solution 
converges as n increases [3] . We take the covariance of the initial condition X (0) as 
the one step prediction from the limit of the Riccati equation solution so that Cn\n-i 
and Cn\n do not depend on n (but M^in-i and M„|„ depend on n). We see that the 
diagonal element of Cn\n-i are about 10“^ for all cases of i?= 10“^, 10“^, 10“^. The 
diagonal element of C„|„ are about 10“^ when R= 10“^, about 10“^ when R = 10~^ 
and about 10“^ when i? = 10“^. 

In this filtering problem, we first investigate where are the observations. We 
apply the Kalman filter for l<n<10® and calculate the values of Di,D 2 and D 3 
satisfying yn = -|- (Hi D 2 D^)* ■ ydiag(C'„|„_i), where is determined by one 

trajectory of the dynamics (EH together with a realisation of the observation noise 
rjn- The histograms in Fig. 17.11 show the distribution of these normalised distances 
between the observation and the prior mean when R= 10“^ (the cases of i? = 10“^ and 
R= 10“^ are similar and not shown). One can see that most of the observations are 
within two times of the standard deviations from the prior mean in each coordinate. 
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Fig. 7.1. The distribution of normalised distances between the observation and the prior mean 
when the noise covariance is Rn = 10~^ x/ 3 . 


Among the cases of 10®, there are 4,592,208 cases for which \Di\ > 1 for all i = 1,2,3 
at the same time. There are 37,574 cases for which \Di \ > 2 for all i at the same time, 
and 60 cases for which \Di \ >3 for all i at the same time. From the simulation, we 
understand the three cases in which the parameter value oi D = Di = D 2 = Ha is 1, 2 
and 3, are normal, exceptional and rare event, respectively. 

7.2.2. norm of the higher order central moments Here we aim to 
investigate the parameter regimes under which our cubature filters are likely (or un¬ 
likely) to outperform. In order to evaluate the computational error, one needs to 
define an error criterion relevant to the approximations. We realise that unfortu¬ 
nately a comparison between the evolving single trajectory and the corresponding 
posterior mean approximation, which is commonly used in the filtering context, is 
highly inappropriate for our purpose. This is because the cubature approximation is 
basically superior within approximating the tail behaviour or higher order moments 
of the probability distribution. Therefore we instead use the norm of the cen¬ 
tral moment to quantify the accuracy of the approximation obtained in the form of 
discrete measure. 

Let CP be the p-th central moment of A = {x^ x^Y, i.e., 

where ij = 1,2,3. The norm of is defined by 

\ 1/2 

E ■ T3) 

Whenp= 1, Eq. (ESI) is the Euclidean norm of the vector. When p = 2, it is equivalent 
with the Erobenius norm of the matrix. Let be the p-th central moment of a particle 
approximation, then the relative root mean square error 

Tmse% = \\CP-CPh/\\CPh (7-4) 

will be calculated to measure the accuracy of the moment approximations. 
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Table 7.1. The number of adaptive partition k for KLV with m = 5 



e=10-2 

e = 10-3 

e=10-4 

e = 10-5 

7 

o 

1—1 

II 

cd 

7 

31 

102 

344 

R=10-2 

10 

29 

101 

330 

R=10-3 

20 

48 

120 

329 



Fig. 7.2. The upper bound of the total error along with the adaptive partition when m = 5. 


7.2.3. Monte-Carlo Gaussian samples In our problem setting, the rmse 
errors are insensitive to the specific time interval between successive observations. 
Taking one arbitrary time interval, we study the cases of = 1,2,3 which correspond 
to normal, exceptional and rare event. The scenario initially may look somewhat 
artificial because, unlike the filtering in practice, the observational data is not gener¬ 
ated from realisations. However we emphasise it has been carefully designed, while 
keeping the practical relevance, in order to find the parameter regimes under which 
our approaches outperform Monte-Carlo methods and this will eventually turn out to 
be extremely helpful for a deeper understanding of the filtering problem. 

We perform Gaussian sampling to obtain three different Monte-Carlo approxi¬ 
mations of the posterior measure. For the first one, we draw samples from the prior 
measure and subsequently apply the bootstrap reweighting to obtain the posterior 
approximation. One can regard these bootstrap reweighted samples from the prior 
as SIR result. The second one is from the SISR algorithm under the transition ker¬ 
nel K{dxn\xn-i,yn) = lP(d 2 ;„,?/„), which is the optimal proposal in the sense of 
minimising the variance of the importance weights M- Finally, we draw samples 
directly from the posterior measures as the third one. Note, in all Monte-Carlo ap¬ 
proximations, neither truncation error due to numerical integration nor resampling 
error is induced for a fair comparison. The rmse errors (Tza of these Gaussian sam¬ 
ples are depicted in Fig. 17.41 when i?=I0“^, 0 = 1,2,3 and in Fig. 17.51 when D = l, 
R = 10“^, 10“^, 10“^. These results will be compared with the cubature filters. 

7.3. PCF and APCF with cubature on Wiener space of degree 5 

7.3.1. Choice of parameters We here implement the PCF and APCF at 
the flow level. In case of d = 3, i.e., when the system is driven by three independent 
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X lY, (m=5, R=10“\ D=1, e = 10“^) X |Y, (m=5, R=10“^, D=1, e = 10“^) 

n' 1:n ' ' n' 1:n ' ' 



n 


n 


(a) PCF using adaptive partition with e = 10 ^ 


(b) APCF using adaptive partition with e = 10 ® 


Fig. 7.3. The relative errors for the p-th moments of the evolutionary posterior. 


white noises, cubature on Wiener space of degree m = 3 and to = 5, with support size 
rim = 6 and rim = 28 respectively, are available. We apply the KLV operator with 
degree to = 5. 

Using the likelihood 5 ^" as the test function /, the adaptive partition X>(e,g^") 
satisfying Eq. (16.101) with Q™ in place of Q™ is analytically obtained for the system of 
Eq. (|7.1I) . Note that the likelihood 5 ^" is the density function of J\f{yn,Rn) and that 
the adaptive partition does not depend on ?/„ but on i?„. The number of iterations k 
as a function of e and R is listed in table 17.11 In this case, Fig. 17.21 reveals the upper 
bound of Eq. (|6.11l) tends to decrease as e becomes smaller. Therefore, by choosing 
9 to be the same order of e, one can combine the adaptive partition and the adaptive 
recombination to achieve a desired degree of accuracy to some extent. 

For the recombination of the PCF, Eq. (16.121) with f = g^" for all i.e.. 


sup 

Vn. 




m,j 

T>,{u,r) 


(/)-$ 






<0 


(7.5) 


is met so that the recombination does not depend on ?/„ but on Rn- We choose the 
recombination degree r = 5 and simulate the PCF for the cases of e = 10“^, 10“^ with 
6 » = 0.3xe. 

For the APCF, the tolerance r has to be chosen in addition to the parameters 
{e, 6 )}. The value of r varies in each case, but we choose it so that the SPL operator 
in Eq. ()6.7p allows 1/4^ 1/3 part of particles leap to the next observation time for all 
iterations except the first and last few steps. The remaining particles are reduced by 
the adaptive recombination, i.e., the recombination satisfies 


C’? I 

L U,{u,r),T 


(/)-$: 


T>,{u,r),T 


(/),Pr-q.g^ 


<9 


(7.6) 


where We again choose the recombination degree r = 5 and simulate 

the APCF for the cases of e = 10“^, 10“^ with 9 — 0.3 x e. 

While the value of D being fixed, we apply the PCF and APCF to obtain the 
values of Eq. (|7.4I) for the evolving posterior meausre. Fig. 17.31 shows that the per¬ 
formances of the two filtering algorithms are stable and that the numerical error 
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estimates of high order moments are insensitive to n (the rest cases produce similar 
plots and are not shown). 

In our numerical simulations, the number of patches needed to satisfy Eq. m 
in the PCF increases as the time partition approaches to the next observation time, 
eventually about 8^ ~ 16^. On the contrary, Eq. (17.61) in the APCF is satisfied with 2^ 
(< 10) patches in most cases. As a result, APCF saves computation time significantly 
compared with PCF. 


7.3.2. Dependence on the observation location 

When is fixed and Z? = 1,2,3 varies, the relative errors of the p-th 

moments of PCF and APCF are shown in Figs. 7.4(b)[ 7.4(i)[ 7.4(j)[ 7.4(k) We 
have implemented two cases of e=10“^ and e = 10“^. The recombination times are 
measured using Visual Studio with Intel 2.53 GHz processor (the autonomous DDEs 
are solved analytically). Fig. 17.41 reveals the following. 

• The prior approximation of PCF with e=10“^ shows similar accuracy with 
10"'^ Monte-Carlo sampling (Figs. 7.4(a) 


7.4(b)). 


The accuracy of the APCF prior approximation is in general worse than PCF 
especially for higher order moments (Fig. 7.4(b)). 

As the observation is located further far from the prior mean, i.e., as D 
increases, the posterior approximation obtained from Monte-Carlo bootstrap 
reweighting (SIR) becomes less accurate (Figs. 7.4(c)[ 7.4(d)[ 7.4(e)). As the 
number of samples M increases, the error reduction asymptotically scales as 
in all cases. 

Unlike the case of SIR, the accuracy of the importance samples (SISR) is 
not significantly influenced by the observation location as well as the number 
of samples (Figs. 7.4(f)[ 7.4(g)[ 7.4(h) |. This sample size insensitivity is 
presumably because SISR duplicates the samples in this parameter regime 
(compare with Fig. 7.5(i) I. 

The accuracy of the APCF posterior approximation is similar to PCF but 
APCF significantly reduces the recombination time which is insensitive to D 
(Figs. [7:4ti)|[7A0)|[7A(k)| ). 

The accuracy of the PCF and APCF posterior approximations with e = 
10“^ is similar to 10"^ Monte-Carlo reweighted samples (SIR) when D = l,2 


(Figs. 7.4(c; 


7.4(i) 

7.4(d) 

7.4(e) 

7.4(k) 


7.4(i) |7.4(d)[ 7.4(j)) and to 10® reweighted samples (SIR) when 


0 = 3 (Figs. 

The accuracy of the PCF and APCF posterior approximations with e= 10 


-3 


is similar to 10® Monte-Carlo reweighted samples when D = 1 (Figs. 7.4(c) 
7.4(i)), to 10® reweighted samples when D = 2 (Figs. 7.4(d)[ 7.4(j) I and to 


10*^ reweighted samples when D = Z (Figs. 7.4(e) 7.4(k)|. 

• The accuracy of the PCF and APCF posterior approximations with e = 10“® 
is superior to 10® importance samples (SISR) when D = 1 (Figs. 7.4(f)[ 7.4(i)), 
comparable to SISR when D = 2 (Figs. |7.4(g)| |7.4(j)|), inferior to SISR when 
D = 3 (Figs. |7.4(h)||7.4(k)D , in approximating higher order moments. 

There is an important insight to be gained from this experimental analysis. 
Though PCF produces a more accurate description of the prior measure than APCF, 
the one from this naive approximation of the prior is not better at approximating the 
posterior. The point is that one needs an extremely accurate representation of the 
prior in certain localities. While APCF delivers this without undue cost, the PCF 
method would have to deliver this accuracy uniformly and well out into the tail of 
the prior. As a result, for the posterior approximation, APCF can achieve a similar 
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Fig. 7.4. The prior and posterior approximations when R=10~^ is fixed and D = 1,2,3 varies. 
The top row is for the prior and the rest bottom three rows are for the posterior. The second row 
(SIR) and the third row (SISR) are from Monte- Carlo samples. The last bottom row is from cubature 
approximation when e = 10“^,10“®. 
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accuracy with PCF but using significantly less computational cost. 

In this example, the computational cost (recombination time) of PCF and APCF 
is uniform and irrespective of D for given e= 10“^, 10“^. However when one uses SIR 
to achieve the accuracy due to APCF with e = 10“^ in approximating higher order 
moments, one needs more computational resources (large number of particles) as D 
becomes bigger. One also cannot expect an accuracy improvement from SISR except 
the rare event case {D = 3). Therefore, in the reliability aspect, APCF is clearly 
advantageous over sequential Monte-Carlo methods. 


7.3.3. Dependence on the observation noise error When D = 1 is fixed 
and i?= 10“^,10“^,10“^ varies, the values of Eq. (17.41) for PCF and APCF are shown 
|7.5(j)[ 7.5(k)[ 7.5(1) We have implemented two cases of e= 10“^ and e= 10“^. 
1 reveals the following. 

The high order moment approximations errors due to Monte-Carlo Gaussian 
samples (direct sampling of the posterior) are insensitive to its covariance 
(recall the diagonal element of Cn\n is of the same order with the value of R) 
(Figs. |7.5(a)l |7.5(b)l [7)5(^ . 

As the likelihood becomes narrower, i.e., as R decreases, the posterior approx¬ 
imation obtained from Monte-Carlo bootstrap reweighting (SIR) becomes less 


accurate (Figs. 7.5(d) 7.5(e) 7.5(f) I. 


The accuracy of importance samples (SISR) tends to increase as R decreases 
(Figs. 7.5(g) 7.5(h) 7.5(i)|. In particular, when 10“^, the moment ap¬ 


proximations of SISR is comparable with those from direct sampling of the 
posterior except the mean (Figs. 7.5(c) 7.5(i) I. 

• As R decreases, the recombination time needed to achieve a given degree of 
accuracy becomes bigger for PCF but this is not the case for APCF, i.e., the 
recombination time for APCF is insensitive to R (Figs. 7.5(j) 7.5(k)[ 7.5(1)). 

The simulation shows that APCF again achieves a similar accuracy with PCF 
in all cases but, as the observation noise error decreases, APCF becomes more com¬ 
petitive than PCF for the solution of the intermittent data assimilation problem. It 
further shows that APCF is of higher order with respect to the recombination time 
and can achieve the given degree of accuracy with lower computational cost. 

Although Yn is there and measurable it is sometimes the case that it is actually 
computationally very expensive to compute and that actually the thing one can com¬ 
pute is the evaluation of likelihood for a number of locations. For example, consider 
a tracking problem for an object of moderate intensity and diameter that does a ran¬ 
dom walk and is moving against a slightly noisy background and is observed relatively 
infrequently. Its influence is entirely local. The likelihood function will be something 
like the Gaussian centred at the position of object but completely uninformative else¬ 
where in the space. The smaller the object, the tighter or narrower the Gaussian the 
harder the problem of finding the object becomes. One can compute the likelihood 
at any point in the space, but only evaluations at the location of the particle are 
informative. In that way one sees that 

1. The Yn is observable but only partially observed - and with low noise is very 
expensive to observe accurately as one has to find the particle. 

2. The likelihood can be observed at points in the space. 

In this sort of example it would be quite wrong to assume that, if we know the prior 
distribution of A„ then just because = Xn + r]n we know the posterior distribution 
at zero cost. For sequential Monte-Carlo methods, bootstrap reweighting would seem 
to give a much better approach. 
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Fig. 7.5. The posterior approximations when D = 1 is fixed and i?= 10“^, 10“^, 10“® varies. 
The top three rows are from Monte-Carlo samples. The first row is from direct sampling of pos¬ 
terior, the second is from SIR and the third is from SISR. The last bottom row is from cubature 
approximation when e = 10“^,10“®. In Fig. \ 7.5(l)\ the PCF with e = 10“® is not shown. 
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Table 7.2. The number of adaptive partition k for GHC with m = 7 



e=10-2 

e = 10-3 

e=10-4 

e = 10-5 
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Fig. 7.6. The upper bound of the total error along with the adaptive partition when m = 7. 


7.4. Prospective performance PCF and APCF with cubature on 
Wiener space of degree 7 A cubature formula on Wiener space of degree m>7 
is currently not available when d = 3. However, in our problem setting, we are able 
to emulate a prospective performance of higher order cubature formula using Gauss- 
Hermite quadrature. 

For the linear dynamics satisfying 

A(A) = FaA(0) + z/a, z^A-^Ar(0,QA) 

where Fa is a matrix, we define the forward operator 

C n \ n Um 

(7.7) 

i=l / i=l i=l 

where is a Gauss-Hermite cubature of degree m with respect to the law 

of i^A- The authors have seen that the performance of GHC is similar to KLV on the 
flow level when m = 3,5 and that Eq. (17.71) can be used as an alternative to Eq. (I4.23P 
in the application of PCE and APCF to the test model. 

The number of iterations k in the adaptive partition, obtained from using GHC 
with Gauss-Hermite cubature of degree m = 7 whose support size is Um = 64 in place 
of Q™, is shown in table 17.21 Here Fig. 17.61 corresponds to Fig. 17.21 and shows an 
enhanced accuracy. We apply GHC with degree m = 7 to obtain a prior and pos¬ 
terior approximation, where the recombination degree r = 5 and 0 = O.2xe is used. 
Our choice of t is again such that 1/4 ~ 1/3 of the particles are allowed to leap to 
the next observation time. The rmse errors (TTil) in the case of i?=10 D = 2 and 
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Fig. 7.7. The posterior approximations when H=10 ^ and D = 2. The left is from Monte- 
Carlo samples and the middle and right is from cubature approximation when e = 10“^,10“^. 


e = 10“^,10“^ are shown in Fig. 7.7(c) and this can be viewed as a result from PCF 
and APCF with cubature on Wiener space of degree m = 7. Its performance is in fact 
one higher order improvement for both accuracy and recombination time in view of 
Figs. 7.7(a)[ 7.7(b) From the simulation, we expect APCF with higher order cuba¬ 
ture formula can outperform Monte-Carlo approximations in any parameter regimes 
including the ones for which it used to be not so successful when it uses a low order 
cubature formula. This further highlights the strong necessity to find out cubature 
formula on Wiener space of degree m > 7 in order to solve the PDF or filtering problem 
with high accuracy in a moderate dimension. 


8. Discussion In this paper we introduce a hybrid methodology for the numer¬ 
ical resolution of the filtering problem which we named the adaptive patched cubature 
filter (APCF). We explore some of its properties and we report on a first attempt at 
a practical implementation. The APCF combines many different methods, each of 
which addresses a different part of the problem and has independent interest. At a 
fundamental level all of the methods use high order approaches to quantify uncertainty 
(cubature), and also to reduce the complexity of calculations (recombination based 
on heavy numerical linear algebra), while retaining explicit thresholds for accuracy 
in the individual computation. The thresholds for accuracy in a stage are normally 
achievable in a number of ways (e.g., small time step with low order, or large time step 
with high order) and the determination of these choices depends on computational 
cost. Aside from this use of the error threshold and choices based on computational 
efficiency there are several other points to observe in our development of this filter. 

1. One feature is the surprising ease with which one can adapt the computations 
to the observational data and so avoid performing unnecessary computations. 
In even moderate dimensions (we work in 3 -I- 1) this has a huge impact for the 
computation time while preserving the accuracy we achieve for the posterior 
distribution (Figs. [7A(i)| [7(4(j)| |7.4(k)[ Figs. |7.5(k)[ [A^ . It is an 

automated form of deterministic high order importance sampling which has 
wider application than the one explored in this paper, for instance it is used 
to deliver accurate answers to PDF problems with piecewise smooth test 
function in the example developed in [29) . 
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2. Another innovation allowing a huge reduction in computation is the ability to 
efficiently patch the particles in the multiple dimensional scenario. Although 
the problem might at first glance seem elementary, it is in fact the problem of 
data classification. To resolve this problem we introduce an efficient algorithm 
for data classification based on extending the Morton order to floating point 
context. This method has now also been used effectively for efficient function 
extrapolation [34] . 

3. The KLV algorithm is at the heart of a number of successful methods for 
solving PDEs in moderate dimension [33]. In each case, something has to be 
done about the explosion of scenarios after each time step; this in turn has to 
rely on and understanding the errors. In this paper we take a somewhat dif¬ 
ferent approach to the literature m in the way we use higher order Lipschitz 
norms systematically to understand how well functions have been smoothed, 
and to measure the scales on which they can be well approximated by poly¬ 
nomials. This has the consequence that one can be quite precise about the 
errors one incurs at each stage in the calculation. In the end this is actually 
quite crucial to the logic of our approach since an efficient method requires 
optimisation over several parameters - something that is only meaningful if 
there are (at least in principle) uniform estimates on errors. As a result of 
this perspective, we do not follow the time steps and analytic estimates intro¬ 
duced by Kusuoka in [26] although we remain deeply influenced by balancing 
the smoothing properties of the semigroup with the use of non-equidistant 
time steps. 

4. The focus on Lipschitz norms makes it natural to apply an adaptive approach 
to the recombination patches as well as to the prediction process. In both 
cases we can be lead by the local smoothness of the likelihood function as 
sampled on our high order high accuracy set of scenarios. 

5. We have focussed our attention on the quality of the tail distribution of the 
approximate posterior we construct. This is important in the filtering problem 
because a failure to describe the tail behaviour of the tracked object implies 
that one will lose the trajectory all together at some point. These issues are 
particularly relevant in high dimensions as the cost of increasing the frequency 
of observation can be prohibitive. If one wishes to ensure reliability of the 
filter in the setting where there is a significant discrepancy between the prior 
estimate and the realised outcome over a time step then our APCF with 
cubature on Wiener space of degree 5 already shows in the three dimensional 
example that it can completely outperform sequential importance resampling 
Monte-Carlo approach. The absence, at the current time, of higher order 
cubature formulae is in this sense very frustrating as the evidence we give 
suggests that higher degree methods will lead to substantial further benefits 
for both computation and accuracy. 

In putting this paper together we have realised that there are many branches in 
this algorithm that can be improved, in particular some parts of the adaptive process 
and also the recombination (a theoretical improvement in the order of recombination 
has recently been discovered [38]). There are also large parts that can clearly be 
parallelised. We believe that there is ongoing scope for increasing the performance of 
the APCF. 
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